import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import matplotlib.ticker as mtick
import plotly.express as px
import plotly.graph_objects as go
import geopandas as gpd
import warnings
warnings.filterwarnings("ignore")
from IPython.display import HTML, display
import pprint
from scipy import stats
import scipy.stats as ss
import statsmodels.api as sa
# Create toggle cell button
font = "Roboto-Regular.ttf"
pp = pprint.PrettyPrinter(indent=4, width=100)
HTML('''
<style>
.output_png {
display: table-cell;
text-align: center;
vertical-align: middle;
}
.output {
display: flex;
align-items: left;
text-align: justify;
}
</style>
<script>
code_show=true;
function code_toggle() {
if (code_show){
$('div.input').hide();
} else {
$('div.input').show();
}
code_show = !code_show
}
$( document ).ready(code_toggle);
</script>
<form action="javascript:code_toggle()"><input type="submit"
value="Click here to toggle on/off the raw code."></form>
''')
Yuka Saso,Hidilyn Diaz, Nesthy Petecio. These women have all performed well in different sporting events and have brought honor and glory to the Philippines. Consider also, that these sports, weightlifting, golf, and boxing have traditionally been thought of as “men’s sports”.
The achievements of Filipinas have been highlighted a lot in the recent past. And their achievements, while welcome, have once again raised comments such as “See! Women are superior to men”.
Comparisons between the genders have always been a favorite topic of discussion, usually accompanied by the consumption of alcohol. And once the comparisons begin, the usual arguments start to flow as well. Some arguments seem to be taken for granted and some are disputable.
“Men drink more alcohol”, “Men consume more rice”, “Men are the bread-winners”. The list goes on.
Obviously, the debate about which is the better gender will continue to rage on. Our group wants to, not so much as to prove or disprove the better gender, but to provide more clarity on the gender biases by studying single men and women in the Philippines.
There are many biases revolving around gender. While the physical differences between the genders are more obvious, there are other alleged differences between the gender which sometimes border on myths and sexism. Using the FIES study of the Philippines, our group wanted to shed further light on these supposed differences and help dispel or support these beliefs.
By helping to delineate the differences and non-differences between the genders, our study hopes to create more connections between the genders. We can show through this study that while we are different in many ways, there are also many ways that we are the same. Hopefully, we can embrace these similarities and differences.
The FIES surveyed 41,544 households across the country. Interestingly, among those surveyed, there are 4.8% who claimed to live alone and 4.7% who said they have never been married. Combining these two conditions, we have 1.8% of the respondents who have never been married and do not share the household with their family. Our analysis will focus on this demographic: the single and alone people in hopes to settle the various debates among men and women.
fies = pd.read_csv('df2015_complete.csv')
count = pd.DataFrame({'Family_size': [len(fies[(fies['fsize']==1.0)]),
len(fies['fsize'])],
'Marital Status': [len(fies[(fies['ms']==1.0)]),
len(fies['ms'])],
'Alone': [len(fies[(fies['ms']==1.0) &
(fies['fsize']==1.0)]),
len(fies['ms'])]})
count['status'] = ['single', 'not single']
count2 = count.melt(id_vars=('status'))
colors = ['#0f4c81','#dddddd']
fig, (ax1, ax2, ax3) = plt.subplots(1, 3, figsize=(16, 10))
ax1.pie(count['Family_size'], labels=['Living alone', ''], colors = colors,
wedgeprops={'width':0.4}, textprops={'fontsize': 12})
ax1.text(0, 0.05, "4.8%", ha='center', va='center', fontsize=38)
ax1.text(0, -0.25, "(1,993)", ha='center', va='center', fontsize=22)
ax2.pie(count['Marital Status'], labels=['Single', ''], colors = colors,
wedgeprops={'width':0.4}, textprops={'fontsize': 12})
ax2.text(0, 0.05, "4.7%", ha='center', va='center', fontsize=38)
ax2.text(0, -0.25, "(1,942)", ha='center', va='center', fontsize=22)
ax3.pie(count['Alone'], labels=['Single &\n Alone', ''], colors = colors,
wedgeprops={'width':0.4}, textprops={'fontsize': 12})
ax3.text(0, 0.05, "1.8%", ha='center', va='center', fontsize=38)
ax3.text(0, -0.25, "(737)", ha='center', va='center', fontsize=22)
rect = plt.Rectangle(
# (lower-left corner), width, height
(0.68, 0.31), 0.26, 0.38, fill=False, color="k", lw=4,
zorder=1000, transform=fig.transFigure, figure=fig
)
fig.patches.extend([rect])
plt.suptitle('How many single people are there?',
fontsize=20,
x=0.5, y=.75)
plt.show();
import geopandas as gpd
regions = gpd.read_file('./shapefiles/Regions.shp')
sna = fies[(fies['fsize']==1.0) & (fies['ms']==1)]
sna_df = sna.groupby(['w_regn']).size().reset_index().rename(columns={0:'singles_count'})
sex_df = sna.groupby(['w_regn', 'sex']).size().reset_index().rename(columns={0:'singles_count'})
sex_df = sex_df.pivot(index='w_regn',columns='sex')[['singles_count']].reset_index()
sex_df.columns = sex_df.columns.droplevel(0)
sex_df.columns = ['code', 'males', 'females']
reg_dct = {1: 'Ilocos Region (Region I)',
2: 'Cagayan Valley (Region II)',
3: 'Central Luzon (Region III)',
5: 'Bicol Region (Region V)',
6: 'Western Visayas (Region VI)',
7: 'Central Visayas (Region VII)',
8: 'Eastern Visayas (Region VIII)',
9: 'Zamboanga Peninsula (Region IX)',
10: 'Northern Mindanao (Region X)',
11: 'Davao Region (Region XI)',
12: 'SOCCSKSARGEN (Region XII)',
13: 'Metropolitan Manila',
14: 'Cordillera Administrative Region (CAR)',
15: 'Autonomous Region of Muslim Mindanao (ARMM)',
16: 'Caraga (Region XIII)',
41: 'CALABARZON (Region IV-A)',
42: 'MIMAROPA (Region IV-B)'}
reg_df = (pd.DataFrame(reg_dct,
index=[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,41,42])
.iloc[0].reset_index()).rename(columns={'index': 'code',
1: 'REGION'})
df_region = pd.merge(regions, reg_df, on='REGION')
df_region = pd.merge(df_region, sna_df, left_on='code', right_on='w_regn')
df_region = pd.merge(df_region, sex_df, on='code')
import matplotlib.gridspec as gridspec
df_region2 = df_region.copy()
df_region2['p_sngle'] = df_region2['singles_count']/sum(df_region2['singles_count'])*100
df_region2['p_sngle_men'] = df_region2['males']/sum(df_region2['males'])*100
df_region2['p_sngle_wmen'] = df_region2['females']/sum(df_region2['females'])*100
# set a variable that will call whatever column we want to visualise on the map
variable = 'singles_count'
# set the range for the choropleth
colors = 'bone_r'
#vmin, vmax = df_region["singles_count"].min(), df_region["singles_count"].max()
vmin, vmax = 0, 100
# create figure and axes for Matplotlib
fig = plt.figure(figsize=(18, 10))
gs = gridspec.GridSpec(2, 3, height_ratios=[5,1])
ax1 = fig.add_subplot(gs[0])
ax2 = fig.add_subplot(gs[1])
ax3 = fig.add_subplot(gs[2])
ax4 = fig.add_subplot(gs[1, :])
df_region.plot(column=variable,
cmap=colors,
linewidth=0.8,
ax=ax1,
edgecolor='0.8',
vmin=vmin, vmax=vmax, legend=False)
ax1.set_title('All Singles', fontsize=16)
df_region.plot(column='males',
cmap=colors,
linewidth=0.8,
ax=ax2,
edgecolor='0.8',
vmin=vmin, vmax=vmax, legend=False)
ax2.set_title('Males', fontsize=16)
df_region.plot(column='females',
cmap=colors,
linewidth=0.8,
ax=ax3,
edgecolor='0.8',
vmin=vmin, vmax=vmax,
legend=False)
ax3.set_title('Females', fontsize=16)
a = np.array([[0,100]])
img = plt.imshow(a, cmap=colors)
plt.gca().set_visible(False)
cax = plt.axes([0.2, 0.2, 0.63, 0.045])
plt.colorbar(orientation="horizontal", cax=cax, ax=ax4)
plt.title('Count of Singles', y=-1)
#pl.savefig("colorbar.pdf")
plt.suptitle("Where are the single men and ladies?", fontsize=20,
x=0.5, y=0.95)
plt.show();
Where exactly are our single and alone people? Metro Manila wins as the home of most single and alone people having 13% of all single and alone surveyed people. It is followed by Region 4A and Region 6 which each has 10%. Trying to find your special someone? Metro Manila actually concedes 2nd place to Region 6 in number of single and alone males but keeps the 1st place trophy for most single women. Region 4A has the 2nd most single and alone for female and Region 1 comes in at 3rd. Hence, for the males of Region 6, they might want to consider visiting the beaches of Batangas or the surf spots of La Union in hopes of finding their significant other.
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 6))
sns.distplot(sna[sna['sex']==1]['age'], ax=ax1, color='#8eacd0')
ax1.axvline(x=np.median(sna[sna['sex']==1]['age']))
ax1.set_title('Single Males\n'+
f"Median = {np.median(sna[sna['sex']==1]['age'])}\n"+
f"Skew = {sna[sna['sex']==1]['age'].skew():.4f}", fontsize=15)
ax1.set_xlabel("Age", fontsize=15)
sns.distplot(sna[sna['sex']==2]['age'], ax=ax2, color='#ffb8c7')
ax2.axvline(x=np.median(sna[sna['sex']==2]['age']), color='#ffb8c7')
ax2.set_title('Single Females\n'+
f"Median = {np.median(sna[sna['sex']==2]['age'])}\n"+
f"Skew = {sna[sna['sex']==2]['age'].skew():.4f}", fontsize=15)
ax2.set_xlabel("Age", fontsize=15)
plt.suptitle("Age Difference Between Males & Females", fontsize=18, y=1.1)
Text(0.5, 1.1, 'Age Difference Between Males & Females')
def test_norm(feature):
from scipy.stats import shapiro
stat, p = shapiro(feature)
alpha = 0.05
if p > alpha:
return f'Normal distribution (p={p:.4f})'
else:
return f'Not a normal distribution (p={p:.4f})'
def non_param_sig(test1, test2):
from scipy.stats import mannwhitneyu
stat, p = mannwhitneyu(test1,
test2)
alpha = 0.05
if p > alpha:
return (f'Same distribution (fail to reject H0, pvalue={p:.4f})')
else:
return (f'Different distribution (reject H0), pvalue={p:.4f}')
print(f"Age of Males: {test_norm(sna[sna['sex']==1]['age'])}")
print(f"Age of Females: {test_norm(sna[sna['sex']==2]['age'])}")
print(non_param_sig(sna[sna['sex']==1]['age'], sna[sna['sex']==2]['age']))
Age of Males: Not a normal distribution (p=0.0000) Age of Females: Not a normal distribution (p=0.0000) Different distribution (reject H0), pvalue=0.0000
But wait, before you book that ticket to La Union, let us first look at the age demographics to see if it matches your ideal person. We can see here that the median age for males is 44 while the median age for females is 57. Are these two groups siginificantly different? After checking for normality, we have observed that the distributions are not normal and thus, we need to use a non-parametric test. We decided to use the Man-Whitney test which showed that the two ages are significantly different! This is supported by the visualization of the distribution where we can see that males are slightly skewed to the right, meaning there are more people younger than the median while for females, it is skewed to the left meaning there are more people older than the median. If we recall our conditions in the Philippines, a possible explanation is that males are likely to live alone starting from a young age all the way until they reach middle age and probably settled down after. On the contrary, for females, they probably lived with their families at a younger age but ended up being alone once their family passed away.
What other things will we find out that are the same or different between men and women? We will explore the topics of income, food, and other expenses to find out.
sex_dict = {
1: 'male',
2: 'female'
}
maj_inc_dict = {
1: 'Wage/Salaries',
2: 'Enterpreneurial Activities',
3: 'Other sources of Income'
}
min_inc_dict = {
1: 'Wage/Salary from Agri. Actvity',
2: 'Wage/Salary from Non-Agri. Activity',
3: 'Crop Farming and Gardening',
4: 'Livestock and Poultry Raising',
5: 'Fishing',
6: 'Forestry and Hunting',
7: 'Wholesale and Retail',
8: 'Manufacturing',
9: 'Community, etc. services',
10: 'Transport and Communication',
11: 'Mining',
12: 'Construction',
13: 'Entrep. Activity N.E.C.',
14: 'Net Share of Crops and others',
15: 'Assistance from Abroad',
16: 'Assistance from Domestic Source',
17: 'Rental of Lands and other Properties',
18: 'Interests from Banks / loans',
19: 'Pensions and retirements benefits',
20: 'Dividend from Investments',
21: 'Rental value of owner-occupied dwelling unit for income',
22: 'Income from family sustenance activities',
23: 'Received as Gifts',
24: 'Other Income'
}
# Example of the Shapiro-Wilk Normality Test
from scipy.stats import shapiro
def shapiro_wilks(data):
stat, p = shapiro(data)
print('stat=%.3f, p=%.3f' % (stat, p))
if p > 0.05:
print('Probably Gaussian')
else:
print('Probably not Gaussian')
# Example of the Kruskal-Wallis H Test
from scipy.stats import kruskal
def kruskal_wallis(*data):
stat, p = kruskal(*data)
print('stat=%.3f, p=%.3f' % (stat, p))
if p > 0.05:
print('Probably the same distribution')
else:
print('Probably different distributions')
from scipy.stats import mannwhitneyu
def mann_whitney(*data):
stat, p = mannwhitneyu(*data)
print('stat=%.3f, p=%.3f' % (stat, p))
if p > 0.05:
print('Probably the same distribution')
else:
print('Probably different distributions')
df = pd.read_csv('fies2015_eda.csv')
sex = 'Household Head Sex (2nd visit only)'
income = 'Total Income'
maj_income = 'Major Grouping of Main Source of Income'
min_income = 'Detailed Grouping of Main Source of income'
ms_df = (df[(df['Family SIze'] == 1) &
(df['Household Head Sex (2nd visit only)'] == 1) &
(df['Household Head Marital Status (2nd visit only)'] == 1)])
fs_df = (df[(df['Family SIze'] == 1) &
(df['Household Head Sex (2nd visit only)'] == 2) &
(df['Household Head Marital Status (2nd visit only)'] == 1)])
How does the pay gap affect our single men and women? Using Shapiro-Wilk test, the group determined that the income data for men and women are not normally distributed. Knowing that the data is non-parametric, we ran a Mann Whitney test.
query = income
print('Mean Income Male:', ms_df[query].mean())
print('Mean Income Female:', fs_df[query].mean())
Mean Income Male: 116892.49176954733 Mean Income Female: 141681.43027888445
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 8));
sns.distplot(np.log(ms_df[query]),
ax=ax1, color='#8eacd0', hist_kws={'alpha': .7});
ax1.set_title('Single Males');
sns.distplot(np.log(fs_df[query]),
ax=ax2, color='#ffb8c7', hist_kws={'alpha': .7});
ax2.set_title('Single Females');
plt.suptitle("Income Difference Between Males & Females", fontsize=18);
The data shows that the mean income for single men amounts to PHP 116,892 while the median income for single women amounts to a much higher PHP 141,681. When log transformed, the data for the women is normally distributed, however the income for men follows a different distribution which suggests that the data for male income may have a lot of outliers.
m_maj_inc = ms_df[maj_income].replace(maj_inc_dict)
m_maj_inc.value_counts().sort_index(ascending=False)
mmaj_index = [i for i in m_maj_inc.value_counts()
.sort_index(ascending=False).index]
mmaj_vals = (m_maj_inc.value_counts()
.sort_index(ascending=False).values)
f_maj_inc = fs_df[maj_income].replace(maj_inc_dict)
f_maj_inc.value_counts().sort_index(ascending=False)
fmaj_index = [i for i in f_maj_inc.value_counts()
.sort_index(ascending=False).index]
fmaj_vals = (f_maj_inc.value_counts()
.sort_index(ascending=False).values)
categories = mmaj_index
fig = go.Figure(layout_title_text="Sources of Income")
fig.add_trace(go.Scatterpolar(
r=mmaj_vals,
theta=categories,
fill='toself',
name='Male',
fillcolor='#8eacd0',
opacity=.7
))
fig.add_trace(go.Scatterpolar(
r=fmaj_vals,
theta=categories,
fill='toself',
name='Female',
fillcolor='#ffb8c7',
opacity=.7
))
fig.update_layout(
polar=dict(
radialaxis=dict(
visible=True,
range=[0, 300]
)),
showlegend=True
)
fig.show()
df_box = df[[sex, income, maj_income]]
df_box[sex].replace(sex_dict, inplace=True)
df_box[maj_income].replace(maj_inc_dict, inplace=True)
df_box[income] = np.log(df_box[income])
plt.figure(figsize = (16,8))
ax = sns.boxplot(x=maj_income, y=income, hue=sex,
data=df_box, palette=['#ffb8c7', '#8eacd0']);
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.);
plt.xticks(rotation=0);
plt.title('Sources of Income', y=1.05, fontsize = 13)
ax.set_xlabel('');
Men who earn from wages or ‘Entrepreneurial Activities’ do not follow a normal distribution but the income men earn from ‘Other sources of income’ do follow a normal distribution. On the other hand, women who earn income from wages or ‘other sources of income’ do not follow a normal distribution, but the income from entrepreneurial activities do follow a normal distribution.
Further drilling down on the top 10 sources of income per sex, both men and women make the most median income from ’Dividend from Investments and Construction’. Men then receive a higher median income from ‘Rental of Lands and other Properties’, ‘Interest from Banks/Loans’, then ‘Entrepreneurial Activities’. Women on the other hand receive a higher median income from ‘Assistance from Abroad’, ‘Entrepreneurial Activities’, and ‘Wages from Non-Agricultural Activities’.
df_mingrp = df[[sex, income, maj_income, min_income]]
df_mingrp[sex].replace(sex_dict, inplace=True)
df_mingrp[maj_income].replace(maj_inc_dict, inplace=True)
df_mingrp[min_income].replace(min_inc_dict, inplace=True)
m_df_mingrp = df_mingrp[df_mingrp[sex] == 'male']
m_df_mingrp = m_df_mingrp.drop(columns=sex)
f_df_mingrp = df_mingrp[df_mingrp[sex] == 'female']
f_df_mingrp = f_df_mingrp.drop(columns=sex)
m_df_mingrp2 = (pd.DataFrame
(m_df_mingrp.groupby([min_income])[income].median()
.sort_values(ascending=False)).round(decimals=2)
[:10].rename(columns={income: 'Median Income'}))
print('Male')
m_df_mingrp2
Male
| Median Income | |
|---|---|
| Detailed Grouping of Main Source of income | |
| Dividend from Investments | 749500.0 |
| Construction | 374745.0 |
| Rental of Lands and other Properties | 331500.0 |
| Interests from Banks / loans | 302596.0 |
| Entrep. Activity N.E.C. | 281877.5 |
| Pensions and retirements benefits | 266973.0 |
| Community, etc. services | 264604.0 |
| Assistance from Abroad | 239968.0 |
| Wage/Salary from Non-Agri. Activity | 210288.0 |
| Wholesale and Retail | 201343.5 |
f_df_mingrp2 = (pd.DataFrame(f_df_mingrp.groupby([min_income])[income].median()
.sort_values(ascending=False)).round(decimals=2)
[:10].rename(columns={income: 'Median Income'}))
print('Female')
f_df_mingrp2
Female
| Median Income | |
|---|---|
| Detailed Grouping of Main Source of income | |
| Dividend from Investments | 1375970.0 |
| Construction | 455123.0 |
| Assistance from Abroad | 303122.5 |
| Entrep. Activity N.E.C. | 292025.0 |
| Wage/Salary from Non-Agri. Activity | 236352.0 |
| Rental of Lands and other Properties | 233235.0 |
| Community, etc. services | 224520.0 |
| Pensions and retirements benefits | 176280.0 |
| Transport and Communication | 167533.0 |
| Wholesale and Retail | 163514.0 |
W_REGN= {
1: 'I - Ilocos Region',
2: 'II - Cagayan Valley',
3: 'III - Central Luzon',
5: 'V - Bicol Region',
6: 'VI - Western Visayas',
7: 'VII - Central Visayas',
8: 'VIII - Eastern Visayas',
9: 'IX - Zasmboanga Peninsula',
10: 'X - Northern Mindanao',
11: 'XI - Davao Region',
12: 'XII - SOCCSKSARGEN',
13: 'NCR',
14: 'CAR',
15: ' ARMM',
16: 'Caraga',
41: 'IVA - CALABARZON',
42: 'IVB - MIMAROPA'
}
# create df for male and female
df_alone = fies[(fies['fsize']==1) & (fies['ms']==1)]
df_m = df_alone[df_alone['sex']==1]
df_f = df_alone[df_alone['sex']==2]
# total food / food at home / food outside as a percentage of total expenditures
f_tfood = df_f[['ttotex', 'tfood']]
f_tfood['perct'] = (f_tfood['tfood']/f_tfood['ttotex'])*100
m_tfood = df_m[['ttotex', 'tfood']]
m_tfood['perct'] = (m_tfood['tfood']/m_tfood['ttotex'])*100
f_tfoodhome = df_f[['ttotex', 'tfoodhome']]
f_tfoodhome['perct'] = (f_tfoodhome['tfoodhome']/f_tfoodhome['ttotex'])*100
m_tfoodhome = df_m[['ttotex', 'tfoodhome']]
m_tfoodhome['perct'] = (m_tfoodhome['tfoodhome']/m_tfoodhome['ttotex'])*100
f_tfoodoutside = df_f[['ttotex', 'tfoodoutside']]
f_tfoodoutside['perct'] = (f_tfoodoutside['tfoodoutside']/f_tfoodoutside['ttotex'])*100
m_tfoodoutside = df_m[['ttotex', 'tfoodoutside']]
m_tfoodoutside['perct'] = (m_tfoodoutside['tfoodoutside']/m_tfoodoutside['ttotex'])*100
f_food = pd.DataFrame()
f_food['tfood'] = f_tfood['perct']
f_food['tfoodhome'] = f_tfoodhome['perct']
f_food['tfoodoutside'] = f_tfoodoutside['perct']
f_food['sex'] = 'Female'
m_food = pd.DataFrame()
m_food['tfood'] = m_tfood['perct']
m_food['tfoodhome'] = m_tfoodhome['perct']
m_food['tfoodoutside'] = m_tfoodoutside['perct']
m_food['sex'] = 'Male'
df_food_prct = pd.concat([f_food, m_food])
# total food / food at home / food outside
df_f_ = df_f[['tfood', 'tfoodhome', 'tfoodoutside']]
df_f_['sex'] = 'Female'
df_m_ = df_m[['tfood', 'tfoodhome', 'tfoodoutside']]
df_m_['sex'] = 'Male'
df_food = pd.concat([df_f_, df_m_])
# plot
fig, ax = plt.subplots(1,2, figsize=(20,10))
ax[0].set_title('Food Expenditures')
sns.boxplot(x='variable', y='value', hue='sex', data=pd.melt(df_food, id_vars='sex'), ax=ax[0], palette=['#ffb8c7','#8eacd0'])
ax[1].set_title('Food Expenditures as Percentage of Total Expenditures')
sns.boxplot(x='variable', y='value', hue='sex', data=pd.melt(df_food_prct, id_vars='sex'), ax=ax[1], palette=['#ffb8c7','#8eacd0']);
Looking at the raw amount of food expenditures for males and females, we can see that females have a higher median value for total food expenditures in general and total food expenditures outside compared to males. But once we look at food expenditures as a percentage of total expenditures (representing how males and females budget their expenses) we find that the males have a median value higher than females for all food expenditures. Interestingly, this is aligned with the findings of this 2020 article (https://directionscu.org/2020/01/24/how-men-and-women-manage-money-differently/) on a consumer expenditure survey by the U.S. Bureau of Labor Statistics, where they found that single men spend more on their food annually (4173 USD) compared to single women (3680 USD). But is this statistically significant? Let’s find out.
df_norm = df_alone[['w_regn', 'sex']].copy()
df_norm['tfood'] = df_alone['tfood']/df_alone['ttotex']
df_norm['tfoodhome'] = df_alone['tfoodhome']/df_alone['ttotex']
df_norm['tfoodoutside'] = df_alone['tfoodoutside']/df_alone['ttotex']
df_norm['talcohol'] = df_alone['talcohol']/df_alone['ttotex']
df_norm['tcoffee'] = df_alone['tcoffee']/df_alone['ttotex']
df_norm['region'] = df_alone['w_regn'].map(W_REGN)
df_norm['sex2'] = df_norm['sex'].map({1:'Male', 2:'Female'})
Based on Kruskal-Wallis Test, Total Food Consumed and Total Food Consumed at Home were statistically different among the sexes with males alloting more budget than females. On average, males spend more on food than females at Php 36,000+ vs Php 35,200+. Consistently in their budgets this account for 47.7% of all their expenses while only 41.5% for females. The showed that this is statistically significant difference between the two sexes.
However we cannot generalize that most Filipino males likely spend more, because looking at a regional level most of the regions show that there are no statistical difference. Only in regions V, X, II and NCR then do males significantly spend more on their budget than females for total food consumption and food they consume at home.
Interestingly, in terms of spending when eating out, the Total Food Consumed Outside has no statistical difference between the genders looking at the Philippines at a whole and is consistent across all regions except for NCR where males spend an average of 20% of their budget while only 10% for females.
df_stats = pd.DataFrame()
df_spy = pd.DataFrame()
food_cols = ['tfood', 'tfoodhome', 'tfoodoutside', 'talcohol']
df_stats['field'] = food_cols
for reg in df_norm['w_regn'].unique():
reg_test = []
spy_val = []
for each in food_cols:
x = df_norm[(df_norm['sex']==1)&(df_norm['w_regn']==reg)][each]
y = df_norm[(df_norm['sex']==2)&(df_norm['w_regn']==reg)][each]
try:
stat, p = stats.kruskal(x, y)
if p > 0.05:
reg_test.append("same")
spy_val.append(0)
else:
reg_test.append("diff")
spy_val.append(1)
except:
reg_test.append("all same")
spy_val.append(0)
df_stats[reg] = reg_test
df_spy[reg] = spy_val
df_spy.columns = df_spy.columns.map(W_REGN)
food_names = ['Total Food Consumed', 'Food Consumed at Home', 'Food Consumed Outside', 'Alcohol Consumed']
df_spy.index = food_names
f, ax = plt.subplots(figsize=(8, 8))
ax.spy(df_spy)
ax.set_xticks(range(0,df_spy.shape[1]))
ax.set_yticks(range(0,df_spy.shape[0]))
ax.set_xticklabels(df_spy.columns, rotation=90)
ax.set_yticklabels(df_spy.index, rotation=0)
plt.show()
*Black boxes signifies significant difference
We often wondered, how about alcohol consumption? For sure males spend more on alcohol...or do they?
Upon comparing the percentage that males and females spend on alcohol on their budgets, males do spend more than females. Interestingly in regions IVB Mimaropa and IX Zamboanga, the spending of males and females have no difference. In fact, across all the food expense categories discussed, those two regions together with ARMM show that single male and females spend similarly on their food and alcohol.
The study by the U.S. Bureau of Labor Statistics previously mentioned also supports these findings, where they found that single men spent more than double on alcoholic beverages compared to single women. This may be explained by a study published in Biological Psychiatry (https://www.sciencedaily.com/releases/2010/10/101018112308.htm#:~:text=Despite%20similar%20consumptions%20of%20alcohol,pleasure%2C%20reinforcement%20and%20addiction%20formation) showing that dopamine release may be a factor for greater alcohol consumption in men than women. Men showed greater dopamine release when consuming alcoholic drinks compared to women, which could contribute to habit formation.
df_mean = df_norm.groupby(['region', 'sex'])['tfood', 'tfoodhome', 'tfoodoutside', 'talcohol'].mean()*100
Our group also explored other expenses of singles within the Philippines. We explore three primary categories of expenditures that single Filipinos spend on : Health and Wellbeing (HnW), Leisure Expenses, and Donations. We first explore each category at a high level and identify if whether sex plays a role in determining how Single Filipinos allocate their budget. Afterwards, we dive into a regional view of each category to check if the behaviour of Single Filipinos vary per region.
For codes for determining statistical significance, see supplementary notebook Insight_4_GimmicksNOthers.ipynb
df_hw_entire = pd.read_csv('df_hw_entire.csv')
width = 0.35 # the width of the bars
fig, ax = plt.subplots(figsize = (16,8))
df_m = df_hw_entire[df_hw_entire.sex == 1]
df_f = df_hw_entire[df_hw_entire.sex == 2]
x = np.arange(len(df_m))
p1 = ax.bar(x - width/2 - 0.05, df_m.value, width,
color = "#8eacd0", label = 'Men')
p2 = ax.bar(x + width/2 + 0.05, df_f.value, width,
color = "#ffb8c7", label = 'Women')
ax.set(xticks=x, xticklabels=['Medical Products', 'HnW Misc.'])
signif = ['NOT significant', 'Significant']
for i in range(len(p1)):
if signif[i] == 'NOT significant':
p1[i].set(fill=False, linewidth = 3, hatch = '//',
edgecolor = "#8eacd0")
p2[i].set(fill=False, linewidth = 3, hatch = '//',
edgecolor = "#ffb8c7")
plt.ylabel('Health and Wellness\nExpenses / Total Expenses (%)')
plt.xlabel('Health and Wellness Expenses')
from matplotlib.patches import Patch
from matplotlib.lines import Line2D
["#8eacd0", "#ffb8c7"]
legend_elements = [Patch(facecolor="#8eacd0", edgecolor="#8eacd0",
label='Male'),
Patch(facecolor="#ffb8c7", edgecolor="#ffb8c7",
label='Female'),
Patch(fill=False, linewidth = 1, hatch = '////',
edgecolor = "k",
label = 'Not Significant')]
ax.legend(handles=legend_elements, loc='upper right')
pass
An analysis of the expenses of Single Women and Single Men on Health & Wellness (HnW) was also explored. HnW expenses such as pharmaceutical products, nutrional supplements, and other medical preparation products were accounted for in Medical Products while HnW expenses such as Insurance and personal care services such as hair treatments were accounted for in HnW Misc. Expenses such as dental services or in-patient medical services were also analyzed but were not included in the final report as only a minute portion of the population availed of such services. Using Mann-Whitney U Test, we identified that single Male and Females that live alone based on the FIES dataset spend statistically differently on HnW Misc. but spend similarly on Medical Products as the Mann-Whitney U Test was not able to reject the null hypothesis that gender affects spending on Medical Products.
df_hw_reg_diff = pd.read_csv('df_hw_reg_diff.csv')
new_df = pd.read_csv('new_df_hw.csv')
width = 0.35 # the width of the bars
fig, ax = plt.subplots(figsize = (16,8))
df_m = new_df[new_df.sex == 1]
df_f = new_df[new_df.sex == 2]
x = np.arange(len(df_m))
p1 = ax.bar(x - width/2 - 0.05, df_m.med_products_pcntg, width,
color = "#8eacd0", label = 'Men')
p2 = ax.bar(x + width/2 + 0.05, df_f.med_products_pcntg, width,
color = "#ffb8c7", label = 'Women')
ax.set(xticks=x, xticklabels=list(W_REGN.values()))
plt.xticks(rotation=90)
signif = df_hw_reg_diff.med_products_pcntg
for i in range(len(p1)):
if signif[i] == 'NOT significant':
p1[i].set(fill=False, linewidth = 3, hatch = '//', edgecolor = "#8eacd0")
p2[i].set(fill=False, linewidth = 3, hatch = '//', edgecolor = "#ffb8c7")
plt.ylabel('Health and Wellness\nMedical Product Expenses / Total Expenses (%)')
plt.xlabel('Regions')
legend_elements = [Patch(facecolor="#8eacd0", edgecolor="#8eacd0",
label='Male'),
Patch(facecolor="#ffb8c7", edgecolor="#ffb8c7",
label='Female'),
Patch(fill=False, linewidth = 1, hatch = '////',
edgecolor = "k",
label = 'Not Significant')]
ax.legend(handles=legend_elements, loc='upper right')
pass
Although the nation-wide difference between expenditures of Medical Products were not statistically significant between Males and Females, a region-wide view on the same expenditures tell a different story for some regions. Performing the same statistical tests used in Section 5.1.0.1, we observed that the differences between expenditure on medical products between single males and females is statistically significant for the following regions : Ilocos Region, Cagayan Valley Region, Central Luzon Region, Bicol Region, Central Visayas Region, Zamboanga Peninsula, CAR, CARAGA, and CALABARZON. With respect to significant differences per Mann-Whitney U Test and Student T-test, females spend more on Medical products compared to males in 8 out of 9 regions.
df_hw_reg_diff = pd.read_csv('df_hw_reg_diff.csv')
new_df = pd.read_csv('new_df_hw.csv')
width = 0.35 # the width of the bars
fig, ax = plt.subplots(figsize = (16,8))
df_m = new_df[new_df.sex == 1]
df_f = new_df[new_df.sex == 2]
x = np.arange(len(df_m))
p1 = ax.bar(x - width/2 - 0.05, df_m.hw_misc_pcntg, width,
color = "#8eacd0", label = 'Men')
p2 = ax.bar(x + width/2 + 0.05, df_f.hw_misc_pcntg, width,
color = "#ffb8c7", label = 'Women')
ax.set(xticks=x, xticklabels=list(W_REGN.values()))
plt.xticks(rotation=90)
signif = df_hw_reg_diff.med_products_pcntg
for i in range(len(p1)):
if signif[i] == 'NOT significant':
p1[i].set(fill=False, linewidth = 3, hatch = '//', edgecolor = "#8eacd0")
p2[i].set(fill=False, linewidth = 3, hatch = '//', edgecolor = "#ffb8c7")
plt.ylabel('Health and Wellness\nMiscellaneous Expenses / Total Expenses (%)')
plt.xlabel('Regions')
legend_elements = [Patch(facecolor="#8eacd0", edgecolor="#8eacd0",
label='Male'),
Patch(facecolor="#ffb8c7", edgecolor="#ffb8c7",
label='Female'),
Patch(fill=False, linewidth = 1, hatch = '////',
edgecolor = "k",
label = 'Not Significant')]
ax.legend(handles=legend_elements, loc='upper right')
pass
We also analyzed the expenditures on miscellaneous health and wellness expenses such as payments for insurance and payments for hair treatments in salons at a more granular level. From our tests, we had observed that most differences in this analysis are not statistically significant. However, it is noteable that regions in Luzon showed significant differences with respect to Miscellaneous Health and Wellness Expenses.
df_misc_entire = pd.read_csv('df_misc_entire.csv')
width = 0.35 # the width of the bars
fig, ax = plt.subplots(figsize = (16,8))
df_m = df_misc_entire[df_misc_entire.sex == 1]
df_f = df_misc_entire[df_misc_entire.sex == 2]
x = np.arange(len(df_m))
p1 = ax.bar(x - width/2 - 0.05, df_m.value, width,
color = "#8eacd0", label = 'Men')
p2 = ax.bar(x + width/2 + 0.05, df_f.value, width,
color = "#ffb8c7", label = 'Women')
ax.set(xticks=x, xticklabels=['Special Occassion Expenses', 'Fashion Expenses'])
signif = ['Significant', 'NOT significant']
for i in range(len(p1)):
if signif[i] == 'NOT significant':
p1[i].set(fill=False, linewidth = 3, hatch = '//',
edgecolor = "#8eacd0")
p2[i].set(fill=False, linewidth = 3, hatch = '//',
edgecolor = "#ffb8c7")
plt.ylabel('Leisure Expenses / Total Expenses (%)')
plt.xlabel('Leisure Expenses')
legend_elements = [Patch(facecolor="#8eacd0", edgecolor="#8eacd0",
label='Male'),
Patch(facecolor="#ffb8c7", edgecolor="#ffb8c7",
label='Female'),
Patch(fill=False, linewidth = 1, hatch = '////',
edgecolor = "k",
label = 'Not Significant')]
ax.legend(handles=legend_elements, loc='upper right')
pass
Another field that our team explored is Leisure Expenses such as Special Ocassion Expenses (i.e. food and alcoholic bevarages for parties and special events) and Fashion Expenses (i.e. clothes, shoes, and jewelry). Expenses on gadgets was also initially explored but since only a fraction of correspondents possessed gadgets, further analysis on this field was deferred for future studies. From a high-level overview of the leisure expenses, our statistical tests suggests that the sex of a single person plays a role in how much they spend on special occasions but are not that significantly different when it comes it fashion expenses.
df_misc_reg_diff = pd.read_csv('df_misc_reg_diff.csv')
new_df2 = pd.read_csv('new_df_misc.csv')
width = 0.35 # the width of the bars
fig, ax = plt.subplots(figsize = (16,8))
df_m = new_df2[new_df2.sex == 1]
df_f = new_df2[new_df2.sex == 2]
x = np.arange(len(df_m))
p1 = ax.bar(x - width/2 - 0.05, df_m.spec_expense_pcntg, width,
color = "#8eacd0", label = 'Men')
p2 = ax.bar(x + width/2 + 0.05, df_f.spec_expense_pcntg, width,
color = "#ffb8c7", label = 'Women')
ax.set(xticks=x, xticklabels=list(W_REGN.values()))
plt.xticks(rotation=90)
signif = df_misc_reg_diff.spec_expense_pcntg
for i in range(len(p1)):
if signif[i] == 'NOT significant':
p1[i].set(fill=False, linewidth = 3, hatch = '//', edgecolor = "#8eacd0")
p2[i].set(fill=False, linewidth = 3, hatch = '//', edgecolor = "#ffb8c7")
plt.ylabel('Leisure Expense\nSpecial Occasion Expenses / Total Expenses (%)')
plt.xlabel('Regions')
legend_elements = [Patch(facecolor="#8eacd0", edgecolor="#8eacd0",
label='Male'),
Patch(facecolor="#ffb8c7", edgecolor="#ffb8c7",
label='Female'),
Patch(fill=False, linewidth = 1, hatch = '////',
edgecolor = "k",
label = 'Not Significant')]
ax.legend(handles=legend_elements, loc='upper right')
pass
An analysis of the budget allocation for Special Occassions of single people in each region shows that males generally allocate more budget for food and alcoholic drinks during special occassions in contrast to females. This might be supported by the findings in Section 4 of this report. This section however differs from Section 4 as this particular visualization highlights the budget allocation for food and drinks during special events which may include events such as parties and celebrations.
width = 0.35 # the width of the bars
fig, ax = plt.subplots(figsize = (16,8))
df_m = new_df2[new_df2.sex == 1]
df_f = new_df2[new_df2.sex == 2]
x = np.arange(len(df_m))
p1 = ax.bar(x - width/2 - 0.05, df_m.fashion_misc_pcntg, width,
color = "#8eacd0", label = 'Men')
p2 = ax.bar(x + width/2 + 0.05, df_f.fashion_misc_pcntg, width,
color = "#ffb8c7", label = 'Women')
ax.set(xticks=x, xticklabels=list(W_REGN.values()))
plt.xticks(rotation=90)
signif = df_misc_reg_diff.fashion_misc_pcntg
for i in range(len(p1)):
if signif[i] == 'NOT significant':
p1[i].set(fill=False, linewidth = 3, hatch = '//', edgecolor = "#8eacd0")
p2[i].set(fill=False, linewidth = 3, hatch = '//', edgecolor = "#ffb8c7")
plt.ylabel('Leisure Expense\nFashion Expenses / Total Expenses (%)')
plt.xlabel('Regions')
legend_elements = [Patch(facecolor="#8eacd0", edgecolor="#8eacd0",
label='Male'),
Patch(facecolor="#ffb8c7", edgecolor="#ffb8c7",
label='Female'),
Patch(fill=False, linewidth = 1, hatch = '////',
edgecolor = "k",
label = 'Not Significant')]
ax.legend(handles=legend_elements, loc='upper right')
pass
In contrast to the statistical insignificance of the difference between a nation-wide scope of Fashion Expenses between single males and single females, some regions showed a significant difference. In contrary to popular belief, single males actually spend more on fashion items such as clothing, shoes, and jewelry in contrast to single females in a number of regions.
Our group did find some noticeable differences between single males and females in the Philippines. We only selected the following items for our comparisons: income, food consumption and alcohol consumption.
Based on these categories, there are both significant and non-significant differences between the two genders.
The data shows that the median income for single men is PHP 79,997 while the median income for single women is a much higher PHP 96,227. When log transformed, the data for the women is normally distributed, however the income for men follows a different distribution which suggests that the data for male income may have a lot of outliers.
Looking deeper into the single’s sources of income, men derive most of their income from wages/salaries and ‘Entrepreneurial Activities’ compared to women. On the other hand, women derive their income from ‘Other sources of income’ compared to men.
Looking at the raw amount of food expenditures for males and females, we can see that females have a higher median value for total food expenditures in general and total food expenditures outside compared to males. But once we look at food expenditures as a percentage of total expenditures (representing how males and females budget their expenses) we find that the males have a median value higher than females for all food expenditures.
However, in the end, we go beck to the group's methodology. Although the group discovered and discussed insights that were outside of conventional thought, the group had to go out of its way to identify possible areas where the two genders are statistically different.
Our final conclusion? There are definitely more similarities between single men and women, despite their statistically significant differences. So in the interest of peace, let's make love and not war.